I have a four word New Year’s resolution for Mystery Pollster: Shorter posts, more often. While I want to continue to look at polling methodology in-depth, I am hereby resolved to try to break longer subjects up into multiple posts and to try to post at least once every weekday. In that spirit, I want to continue looking at the topic of the Rasmussen automated survey and "interactive voice response" (IVR) polling in general, and this post will hopefully be the first of a series.
Today I want to correct and extend a graphic that compares the Rasmussen tracking of the Bush job rating to other surveys. About a month ago, with the help of Prof. Charles Franklin, I posted a chart that compared the Bush job rating as measured by the Rasmussen automated survey to the results of other surveys during 2005. Just before the Christmas break I discovered an error in some of the data used in that chart. The problem had to do with the circuitous route necessary to obtain Rasmussen data. Their website makes available only the most recent two weeks of results, saving their archive for paid subscribers. An MP reader and subscriber sent along what we both assumed were results from March to November 2005. It turns out the data he sent for March through June was actually from 2004, not 2005. So the original chart was in error, although amazingly, the error makes very little difference in the appearance of the graphic.
It is easier to show you what I mean. Here is the original but incorrect graphic, which used data from March through June 2004 to plot the Rasmussen trend line for March to June 2005:
Now here is a corrected version with the appropriate data from March and April of 2005:
Amazingly, there is very little difference. The "lowess" regression line in the correct version dips a bit in April, while the erroneous data gave us the impression of a straight line. The first two conclusions that I reached from the original graphic are unchanged:
First, the Rasmussen job approval numbers for President Bush were consistently higher than other polls from March through October. Second, Rasmussen seemed to pick up roughly the same downward trend between March and November.
In making the correction, Prof. Franklin noticed something else we had overlooked in creating the original graphic. We used data for "all other polls" starting in January, while we only had Rasmussen data available since March. It turns out this difference at the left side of the graphic had an impact on the appearance of the regression line at the right. Franklin generated a new graphic that makes an "apples to apples" comparison of data from March to November for both Rasmussen and all other polls.
Notice that the "other polls trend" line now shows the same upturn in early December as the Rasmussen survey. This is a different impression, obviously, than in the original graphic. The reason for the difference is that, as Franklin put it in an email, "adding two months of extra polling ‘smooths’ the lowess fit more. Fewer cases means that the fit is a little rougher." The main point: When we make an "apples to apples" comparison, the Rasmussen trend line is largely consistent with that of other polls even in late 2005, although it does show the Bush job rating to be consistently higher than other surveys.
We are not yet finished, however. Franklin and I discovered all of this because we were able to obtain Rasmussen job rating data going back to March 2004. The plot below of the two year trend now provides the most comprehensive picture yet of how the Rasmussen tracking of the Bush job rating compares with other polls:
While MP will have more to say about the Rasmussen survey in subsequent posts, two conclusions are immediately evident (and consistent with my first impression): (1) The Bush job approval percentage as reported by Rasmussen is consistently a bit higher than the average of other public surveys and (2) the trend is generally consistent over the long term.
Meanwhile a few notes on the data we used to generate these graphics. An MP reader looked up and copied the Bush job rating data available to premium subscribers. Since Rasmussen releases data using a three day rolling average (each daily release reports on interviews conducted over the last three nights), we plotted Rasmussen data for every third day.
As a check, I compared that data to the results posted on a roughly every other week basis by the site RealClearPolitics and data points in common are now consistent for all periods, including March to June 2005. With two exceptions, this graphic omits Rasmussen data released between December 15, 2004 to January 30, 2005. As of late December, according to my source, that data was not available on the Rasmussen Reports archive. The two exceptions are the releases on January 12 and January 23, the two dates on which RCP posted results. I had asked Scott Rasmussen to make that missing data available. While he has kindly answered a series of questions I posed in December – many of which I will review in subsequent posts – he has not yet responded to my request for the missing data. Franklin gathered his data for all other polls from various public sources.
Nice catch, guys. That fills in the picture.
What happened to the Rasmussen approval numbers right around the 2004 election? It looks like they’re non-integer.
There are some smoothers that automatically choose span, but when comparing two separate series with wildly different numbers of observations I usually have to double-check that the spans give roughly the same amount of smoothing.
Several questions come to mind when looking at this data.
One is what factors would likely lead to Rsmussen having a higher than average number (human vs machine influence), sampling pool, weighting, question wording or order, etc.? Because the real question behind the difference is which is more likely to be accurate. Both are showing the same trends, so we aren’t measuring something totaly different.
A second question is the make up of the polls in the other poll trend. Is the difference related to the rasmussen polls or to what it is being compared against. Another way to look at it would be to draw best fit lines for each poll. Then see the lines that most and least match to Rasmussen. This would allow comparisons of polls and methodology that might help to answer the first question. If it turns out that Zogby internet polls make up a large portion of the difference, then the question might be better directed towards why Zogby is different. It may also be that the different polls trend lines keep crossing the rasmussen poll lines. That the increased frequency of rassmussen explains some of the differences. An interesting test of this would be to compare the periodic rassmussen poll data from Real Clear Politics with the more complete rasmussen data set. In theory, the two trend lines would match very closely.
Third, what is the relationship of the rassmussen poll data to “real world” data. Specifically the elections. This would help again in answering the ultimate question in the first question, namely, which gives a more accurate picture.
Finally, and this may be showing my great ignorance on statistics, how statistically significant is the difference. On the one side, my eyeball look at the data says the difference between the two lines is within the usual margin of error. The fact that they consistantly differ is interesting and would argue that the difference is significant.
Regarding the statistical significance of the difference between the Rasmussen and “other polls” trend lines, it is indeed true that the consistency of the difference across time effectively, almost by definition, rules out that the Rasmussen vs. “other” difference is random.
Perhaps more interesting is the other superficially plausible explanation–that the Rasmussen procedure systematically excludes more anti-Bush people from the sample. For example, lower income workers–who might be more Democratic and therefore anti-Bush–might conceivably be impatient with or distrustful of Rasmussen’s automated interactive format, and so exclude themselves in higher proportion than the more financially well off, and perhaps more Republican leaning, respondents. It’s altogether hypothetical, but assume it some similar explanation, for the sake of the next argument.
Then, consider this implication. Rasmussen had a highly accurate prediction of the 2004 election, better than most polls, so the interactive procedure may be effectively screening out a disproportionate number of people who would be anti-Bush and yet would not vote (because if they had voted in the actual election, Rasmussen would have overestimated Bush’s support). If so, then it bears notice that across time the lower Bush’s approval in either set of polls, the greater is the discrepancy between Rasmussen and “other polls.” As a consequence, the more unpopular Bush is, the more the anti-Bush people decline the Rasmussen procedure. So then, if Rasmussen remains an accurate predictor of actual voting–we are forced to the conclusion that the more unpopular Bush is, the less likely his opponents are to vote.
While that may on the surface seem pessimistic for Democrats, it might have a silver lining: the Rasmussen procedure is good at identifying a group of people who–if they did vote–would reduce the overall Republican advantage by a good two or three percentage points.
Ironically, then, the Rasmussen procedure used in reverse might be an extremely cost effective way of identifying high yield
targets for Democratic Get Out the Vote campaigns.
Being a Republican and all, this might be for Rasmussen an unintended consequence or side effect of his procedure.
I noticed that today the Rassmussen poll numbers were very close to the RCP polling average (44% vs 43%). I also notice looking at the charts above, that Rassmussen seems to be a hair ahead of the other poll trend lines in changing direction. This may be because of the daily nature of Rassmussen or that their methodology is tending to ferret out people’s thoughts better.
As an aside, I wonder what the results would be to take some “known activists” on both sides (probably easy enough to find by getting some bloggers) and have the various polling companies “poll” them. The poll results themselves wouldn’t be interesting (we can guess that outcome easy), but then to have the activists rate the polls for bias. Let them use their own criteria, but have them come up with a rating of how liberal, conservative or neutral they felt the poll questions and proffered answers were. If the liberals felt a poll was neutral, but the conservatives felt it was biased to the left, then it might explain poll results that tended to the left (or vice versa). A poll that both sides thought waw truly neutral would be a wonder to behold.
Bush Attempts to Hike Poll Numbers By Blowing Stuff Up, Promising to Blow More Stuff Up
Bush’s numbers are in the tank. Even Fox has him under 40, along with NBC, CNN, CBS, Pew and the AP. In the past few weeks, only Rasmussen has him over 40, but the Rasmussen typically has Bush rated 5-10 points higher than the median. Pew has him at 33…